AITopics | dataset shift

When machine learning systems face dataset shift, model calibration plays a pivotal role in ensuring their reliability.Calibration error (CE) provides insights into the alignment between the predicted confidence scores and the classifier accuracy.While prior works have delved into the implications of dataset shift on calibration, existing CE estimators either (i) assume access to labeled data from the target domain, often unavailable in practice, or (ii) are derived under a covariate shift assumption.In this work we propose a novel, label-free, consistent CE estimator under label shift. Label shift is characterized by changes in the marginal label distribution p(Y), with a constant conditional p(X|Y) distribution between the source and target. We introduce a novel calibration method, called LaSCal, which uses the estimator in conjunction with a post-hoc calibration strategy, to perform unsupervised calibration on the target distribution. Our thorough empirical analysis demonstrates the effectiveness and reliability of the proposed approach across different modalities, model architectures and label shift intensities.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift

Stephan Rabanser, Stephan Günnemann, Zachary Lipton

Neural Information Processing SystemsFeb-12-2026, 19:21:17 GMT

This paper explores the problem of building ML systems that failloudly, investigating methods for detecting dataset shift, identifying exemplarsthat most typify the shift, and quantifying shift malignancy. We focus on severaldatasets and various perturbations to both covariates and label distributions withvarying magnitudes and fractions of data affected. Interestingly, we show thatacross the dataset shifts that we explore, a two-sample-testing-based approach,using pre-trained classifiers for dimensionality reduction, performs best.

classifier, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > UAE (0.06)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

d3d9446802a44259755d38e6d163e820-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 13:29:25 GMT

calibration, datashift, intensity, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

d3d9446802a44259755d38e6d163e820-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 13:29:17 GMT

calibration, neural network, prediction, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)

Add feedback

41a6fd31aa2e75c3c6d427db3d17ea80-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 09:17:51 GMT

architecture, base learner, ensemble, (13 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine (0.68)
Education (0.46)

Add feedback

Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift

Neural Information Processing SystemsDec-25-2025, 16:20:20 GMT

Modern machine learning methods including deep learning have achieved great success in predictive accuracy for supervised learning tasks, but may still fall short in giving useful estimates of their predictive uncertainty. Quantifying uncertainty is especially critical in real-world settings, which often involve input distributions that are shifted from the training distribution due to a variety of factors including sample bias and non-stationarity. In such settings, well calibrated uncertainty estimates convey information about when a model's output should (or should not) be trusted. Many probabilistic deep learning methods, including Bayesian-and non-Bayesian methods, have been proposed in the literature for quantifying predictive uncertainty, but to our knowledge there has not previously been a rigorous large-scale empirical comparison of these methods under dataset shift. We present a large-scale benchmark of existing state-of-the-art methods on classification problems and investigate the effect of dataset shift on accuracy and calibration. We find that traditional post-hoc calibration does indeed fall short, as do several other previous methods. However, some methods that marginalize over models give surprisingly strong results across a broad spectrum of tasks.

dataset shift, name change, predictive uncertainty, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)

Add feedback

Evaluating Robustness to Dataset Shift via Parametric Robustness Sets

Neural Information Processing SystemsDec-24-2025, 09:56:50 GMT

We give a method for proactively identifying small, plausible shifts in distribution which lead to large differences in model performance. These shifts are defined via parametric changes in the causal mechanisms of observed variables, where constraints on parameters yield a robustness set of plausible distributions and a corresponding worst-case loss over the set. While the loss under an individual parametric shift can be estimated via reweighting techniques such as importance sampling, the resulting worst-case optimization problem is non-convex, and the estimate may suffer from large variance. For small shifts, however, we can construct a local second-order approximation to the loss under shift and cast the problem of finding a worst-case shift as a particular non-convex quadratic optimization problem, for which efficient algorithms are available. We demonstrate that this second-order approximation can be estimated directly for shifts in conditional exponential family models, and we bound the approximation error. We apply our approach to a computer vision task (classifying gender from images), revealing sensitivity to shifts in non-causal attributes.

dataset shift, name change, parametric robustness set, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.98)

Add feedback

Where are we with calibration under dataset shift in image classification?

Roschewitz, Mélanie, Mehta, Raghav, Ribeiro, Fabio de Sousa, Glocker, Ben

arXiv.org Artificial IntelligenceOct-23-2025

We conduct an extensive study on the state of calibration under real-world dataset shift for image classification. Our work provides important insights on the choice of post-hoc and in-training calibration techniques, and yields practical guidelines for all practitioners interested in robust calibration under shift. We compare various post-hoc calibration methods, and their interactions with common in-training calibration strategies (e.g., label smoothing), across a wide range of natural shifts, on eight different classification tasks across several imaging domains. We find that: (i) simultaneously applying entropy regularisation and label smoothing yield the best calibrated raw probabilities under dataset shift, (ii) post-hoc calibrators exposed to a small amount of semantic out-of-distribution data (unrelated to the task) are most robust under shift, (iii) recent calibration methods specifically aimed at increasing calibration under shifts do not necessarily offer significant improvements over simpler post-hoc calibration methods, (iv) improving calibration under shifts often comes at the cost of worsening in-distribution calibration. Importantly, these findings hold for randomly initialised classifiers, as well as for those finetuned from foundation models, the latter being consistently better calibrated compared to models trained from scratch. Finally, we conduct an in-depth analysis of ensembling effects, finding that (i) applying calibration prior to ensembling (instead of after) is more effective for calibration under shifts, (ii) for ensembles, OOD exposure deteriorates the ID-shifted calibration trade-off, (iii) ensembling remains one of the most effective methods to improve calibration robustness and, combined with finetuning from foundation models, yields best calibration results overall.

calibration, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.0778

Country: